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Abstract 

We introduce the graphlet decomposition of a weighted network, which encodes a notion 

of social information based on social structure. We develop a scalable inference algorithm, 

which combines EM with Bron-Kerbosch in a novel fashion, for estimating the parameters 

of the model underlying graphlets using one network sample. We explore some theoretical 

properties of the graphlet decomposition, including computational complexity, redundancy 

'~~' and expected accuracy. We demonstrate graphlets on synthetic and real data. We analyze 

^—1 messaging patterns on Facebook and criminal associations in the 19th century. 
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1 Introduction 

In recent years, there has been a surge of interest in social media platforms such as Facebook and 
Twitter, collaborative projects in the spirit of Wikipedia, and services that rely on the social struc- 
ture underlying these services, including cnn.com's "popular on Facebook" and nytimes.com's 
"most emailed". As these platforms and services have been gaining momentum, efforts in the 
computational social sciences have begun studying patterns of behavior that result in organized so- 
cial structure and interactions (e.g., see Lazer et al., 2009). Here, we develop a new tool to analyze 
data about social structure and interactions routinely collected in this context. 

There is a rich literature of statistical models to analyze binary interactions or networks (Gold- 
enberg et al., 2010). However, while many interesting interaction data sets involve weighted mea- 
surements on pairs on individuals, arguably, only a few of the existing models are amenable to 
analyze the resulting weighted networks. We consider situations where we observe an undirected 
weighted network, encoded by a symmetric adjacency matrix with integer entries and diagonal 
elements equal to zero. Our modeling approach is to decompose the graphon A, which defines an 
exchangeable model (Kallenberg, 2005) for integer-valued measurements of pairs of individuals 
P(F| A), in terms of a number of basis matrices that grows with the size of the network. 

The factorization of A is related to models for binary networks based on the singular value decom- 
position and other factorizations (Hoff, 2009; Kim and Leskovec, 2010). However, we abandon 
the popular orthogonality constraint (Jiang et al., 201 1) among the basis matrices PjS to attain in- 
terpretability in terms of multi-scale social structure, and we chose not model zero edge weights 
to enable the estimation to scale linearly with the number of positive weights (Leskovec et al., 
2010). These choices often lead to a non-trivial inferential setting (Airoldi and Haas, 2011). Re- 
lated methods utihze a network to encode inferred dependence among multivariate data (Coifman 
and Maggioni, 2006; Lee et al., 2008). We term our method "graphlet decomposition", as it is 
reminiscent of a wavelet decomposition of a graph. An unrelated literature uses the term graphlets 
to denote network motifs (Przulj et al., 2006; Kondor et al., 2009). 

The key features of the graphlet decomposition we introduce in this paper are: the basis matrices 
PjS are non-orthogonal, but capture overlapping communities at multiple scales; the information 
used for inference comes exclusively from positive weights; parameter estimation is linear in the 
number of positive weights, and avoids dealing with permutations of the input adjacency matrix Y 
by inferring the basis elements Pi from data. 



The basic idea is to posit a Poisson model for the edge weights P{Y\A) and parametrize the rate 
matrix A in terms of a binary factor matrix B. The binary factors can be interpreted as latent fea- 
tures that induce social structure through homophily. The factor matrix B allows for an equivalent 
interpretation as basis matrices, which define overlapping cliques of different sizes. 

Inference is carried out in two stages: first we identify candidate basis matrices using Bron- 
Kerbosch, then we estimate coefficients using EM. The computational complexity of the inference 
is addressed in Section 3.2. 



2 Graphlet decomposition 

Consider observing an undirected weighted network, encoded by a symmetric adjacency matrix 
with integer entries and diagonal elements equal to zero. In the derivations below, we avoid nota- 
tion whose sole purpose is to set to zero diagonal elements of the resulting matrices. 

2.1 Statistical model 



Intuitively, we want to posit a data generating process that can explain edge weights in terms of 
social information, quantified by community structure at multiple scales and possibly overlapping. 
We choose to represent communities in terms of their constituent maximal cliques. 

Definition 1. The graphlet decomposition (GD) of a matrix A with non-negative entries Xij is 
defined as K = BW B', where B is an N x K binary matrix, W is a K x K diagonal matrix. 
Explicitly, we have 
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The basis matrix B can be interpreted in terms overlapping communities at multiple scales. To 
see this, denote the i-th column of the matrix B as 6.,. We can re-write A = X]i=i l^i Pi^ where 
each Pj is an A^ X A^ matrix defined as Pi = b.i h'.^, in which the diagonal is set to zero. We refer 
to Pi as basis elements. Think of the z-th column of B as encoding a community of interest; then 



the vector b.i specifies which individuals are members of that community, and the basis element 
Pi is the binary graph that specifies the connections among the member of that community. The 
same individual may participate to multiple communities, and the communities may span different 
scales. 

The model for an observed network Y is then 

This is essentially a binary factor model with a truncated Poisson link and two important nu- 
ances. Standard factor models assume that entire rows of the matrix Y are conditionally indepen- 
dent, even when the matrix is square and row and column i refer to the same unit of analysis. In 
contrast, our model treats the individual random variables yij as exchangeable, and imposes the re- 
striction that positives entries in the factors map (in some way) to maximal cliques in the observed 
network. In addition, the matrix Y is sparse; our model implies that zero edge weights carry no 
information about maximal cliques, and thus are not relevant for estimation. These nuances — 
map between cliques and factors, and information from positives entries only — ^produce a new and 
interesting inferential setting that scales to large weighted networks. 

2.2 Inference 

Computing the graphlet decomposition of a network Y involves the estimation of a few critical 
parameters: the number of basis elements K, the basis matrix B or equivalently the basis elements 
Pi:K, and the coefficients associated with these basis elements fii.K- 

We develop a two-stage estimation algorithm. First, we identify a candidate set of K^ basis 
elements, using the Bron-Kerbosch algorithm, and obtain a candidate basis matrix B^. Second, we 
develop an expectation-maximization (EM) algorithm to estimate the corresponding coefficients 
/ii;i<-c; several of these coefficients will vanish, thus selecting a final set of basis elements. We 
study theoretical properties of this estimation algorithm in Section 3. Figure 1 illustrates the steps 
of the estimation process on a toy network. 

2.2.1 Identifying candidate basis elements 

The algorithm to identify the candidate basis elements proceeds by thresholding the observed 
network F at a number of levels, t = max(F) . . . min(F), and by identifying all the maximal 
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Figure 1: An illustration of the two-stage estimation algorithm for the graphlet decomposition. 

cliques in the corresponding sequence of binary networks, Y^^^ = 1{Y > t), using the Bron- 
Kerbosch algorithm (Bron and Kerbosch, 1973). The resulting cumulative collection of K'^ max- 
imal cliques found in the sequence of networks Y^^^ is then turned into candidate basis elements, 
P^, f or i = 1 . . . K'^. Algorithm 1 details this procedure. 

B'^ = empty basis set 
For t = max(F) to min(F) 

yw = i(r > t) 

C = maximal cliques(F'^*)) using Bron-Kerbosch 

5^ = 5^ u c 

Algorithm 1: Estimate a basis matrix, B^, encoding candidate elements, P"^, from a weighted 
network, Y. 



Algorithm 1 allows us to avoid dealing with permutations of the input adjacency matrix Y by 
inferring the basis elements Pi independently of such permutation. 

In Section 3.1 we show that, under certain conditions on the true underlying basis matrix B, 
Algorithm 1 finds a set of candidate basis elements that contains the true set of basis elements. 
In addition, in Section 3.2 we show that, under the same conditions on B and additional realistic 
assumptions, we expect to have a number of candidate basis elements of the same order of magni- 
tude of the true number of basis elements, K'^ = 0{K), as the network size A^ grows, given that 
the maximum weight is stationary, max(F) = 0(1). 



2.2.2 Sparse Poisson deconvolution 

Given the candidate set of basis elements encoded in B'', we develop an EM algorithm to estimate 
the corresponding coefficients /ii:i^c. Under certain conditions on the true basis matrix B, this 
algorithm consistently estimates the coefficients by zeroing out the coefficients associated with the 
unnecessary basis elements. 

Recall that we have K'^ candidate basis elements P{.k<=- Let's define a set of statistics T^ = 
Yliij Pkij for A; = 1 . . . K^, and let's introduce a set of latent N x N matrices Gk vvith positive 
entries, subject to the only constrain that ^^ Gk = Y. Algorithm 2 details the EM iterative 
procedure that will lead to maximum likelihood estimates for the coefficients fXi:K'=- 

Initialize /i^°) 

While||/i(*+i)-/iW|| >e 
For k=l to K 



.(i+l) _ , W^^tPmi 

M-step:/.r) = E,,G(:;^^; 



E-step: G^t^^ = ^^': — ^-«^, 



t=t+l; 

Algorithm 2: Estimate non-negative coefficients, yU, for all candidate elements, P'^, in the basis 
matrix B'^. 

This is the Richardson-Lucy algorithm for Poisson deconvolution (Richardson, 1972). The trun- 
cated Poisson likelihood can also be maximized directly, using a KL divergence argument (Csiszar 
and Shields, 2004) involving the discrete probability distribution obtained by normalizing the data 
matrix, Y, and a linear combination of discrete probability distributions obtained by normalizing 
the candidate basis matrices, J2k ^kPk- 



3 Theory 

Here we develop some theory for graphlets when the graph Y is generated by a non-expandable 
collection of basis matrices. The extent to which these results extend to the general case is un- 
clear, as of this writing, however, the results help form some intuition about how graphlets encode 
information. See the Appendix for details of the proofs. We begin by establishing some notation. 

Definition 2. Given two adjacency matrices Pi and P2 both binary define Pi C Pg to mean 
P2{j, k) = 1 whenever Pi(j, k) = 1; i.e. the induced graph by Pi is a subgraph 0/P2. 

Definition 3. Define P = \/i^jPi to mean P{j,k) = 1 whenever any Pi{j,k) = 1 and zero 
otherwise. 



Definition 4. Given {Pk}, we say that P' is an expansion of Pk if both 2 and 3 are satisfied. 

Definition 5 (non expandable basis). A collection of {Pk} is non-expandable if for any expansion 
P' of Pk we can find j such that P' C Pj. 

A consequence of the above definition is that, if a set of basis is non-expandable, then any subset 
of it is also non-expandable. 



3.1 Identifiability 

The first result asserts that the collection of Poisson rates A can be uniquely decomposed into a 
collection of basis matrices Pi.k, here encoded by the equivalent binary matrix B, with weights 

W = diag(/ii;i^). 

Theorem 1. Let A be a symmetric N x N nonnegative matrix which is generated by a non- 
expandable basis. Then there exists a unique N x K matrix B and K x K matrix W such that 
A = BWB'. 

This implies that, in the absence of noise, the set of parameters B,W, K are estimable from 
the data Y. The proof of Theorem 1 is constructive by means of Algorithm 3, which is fast but 
sensitive to noise. In practice, we propose a more robust algorithm that uses non-expandability 
to identify a collection of candidate basis elements B^, which provably includes the unique non 
expandable basis B that generated the graph. 

Theorem 2. Let A = BWB', where W is a diagonal matrix with nonnegative entries and B 
encodes a non-expandable basis, both unknown. IfB^ is the output of Algorithm 1 with A as input, 
then B C B^. 

The two theorems above may be used to generate random cliques, for instance, by requiring the 
entries of the matrix B to be IID Bernoulli random variables. 



3.2 Redundancy and complexity 

Here, we derive an upper bound for the maximum number of candidate basis elements encoded 
in B"^ in a network with at most K = clog2 A^ cliques. Networks with this many cliques include 
networks generated with most exchangeable graph models (Goldenberg et al., 2010), and are the 
largest networks with an implicit representation (Kannan et al., 1988). 
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Theorem 3. Let the elements Bik of the basis matrix for a network Y be IID Bernoulli random 
variables with parameter p^- Then an asymptotic upper bound for the number of candidate basis 
elements, denoted Cn,pi^ , identified by Algorithm 1 is 

Cn,p. < g(2^^(^-) + K) 

= g(iV^i^(P^) + clog2Ar), (2) 

where Q is the number of thresholds in Algorithm 1. 

While Theorem 3 applies in general, a notion of redundancy of the candidate basis set is well 
defined only for networks generated by a non expandable basis, in which B^ (1 B.ln this case, we 
define redundancy as the ratio Rm = Cn,p,^/K. Theorem 2 states that number of candidate basis 
elements is never smaller than the true number of basis K. Thus Theorem 3 leads to the following 
upper bound on redundancy 

In a realistic scenario we may have that piy = 0(l/log2 A^), the complexity of the candi- 
date basis set is 0{Qlog2N), and the implied redundancy is at most 0{Q). Alternatively, if 
Pn = 0{l/K), the implied redundancy is at most 0{QK). These are regimes we often encounter 
in practice. These limiting behaviors of p^ arise whenever the nodes in a network can be as- 
sumed to have a capacity that is essentially independent of the size of the network, e.g., individuals 
participate in the activities of a constant number of groups over time. 

These calculations are only suggestive for networks generated by an expandable basis. How- 
ever, non expandability tends to be a reasonable approximation in many situations, including when- 
ever exchangeable graph models are good fit for the network. In the analysis of messaging patterns 
on Facebook in Section 4.3, for instance, we empirically observe Cn,p ~ 2i^. 

3.3 Accuracy with less than K basis elements 

The main estimation Algorithm 2 recovers the correct number of basis elements K and the corre- 
sponding coefficients fj, and basis matrix B, whenever the true weighted network Y is generated 
from a non expandable basis matrix. Here we quantify the expected loss in reconstruction accuracy 
if we were to use K < K basis elements to reconstruct Y . To this end we introduce a norm for a 
network Y , and related metrics. 

Definition 6 (r-norm). Let Y ~ Poisson{BW B'), where W = diag{fj.i . . . ^k)- Define the 



statistics ak = J2i=i Bikfor k = 1 . . . K. The r-norm ofY is defined as t{Y) = \ J2k=i /^fcOfcl- 

Consider an approximation Y = BWB' characterized by an index set £■ C {1 . . . K}, which 
specifies the basis elements to be excluded by setting the corresponding set of coefficients ixe to 
zero. Its reconstruction error is t(Y — Y) = \ '^^tE f^k'^k], and its reconstruction accuracy is 
r{Y)/r(Y). Thus, given a network matrix Y representable exactly with K basis elements, the best 
approximation with K basis elements is obtained by zeroing out the lowest K — K coefficients /z. 

We posit the following theoretical model, 

/ifc ~ Gamma{a + (3, 1) (4) 

ttk/N ~ Beta{a,l3) (5) 

fJ-k'^-k/N ~ Gammala,!), (6) 

for A; = 1 . . . K. This model may be used to compute the expected accuracy of an approximate 
reconstruction based on K < K basis elements, since the magnitude of the ordered fik coefficients 
that are zeroed out are order statistics of a sample of Gamma variates (Shawky and Bakoban, 
2009). Given K and a, we can compute the expected coefficient magnitudes and the overall 
expected accuracy tq. 

Theorem 4. The theoretical accuracy of the best approximate Graphlet decomposition with K out 
of K basis elements is: 

MK,K,a) = }_^ ^^ , (7) 

where 

(a-l)(iC-l) , . 

/(l'^'«) = ff^ E '^rnia^K-l)^^^, (9) 

in which the coefficients Cm are defined by the recursion 

i=a—l ^ 

Cm{a,q)= y^ -Cm-i{(y,q-l) (10) 

i=0 

with boundary conditions Cm(a, 1) = ^ for i = 1 . . .a. 
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Figure 2: Theoretical and empirical accuracy for different fractions of basis elements K /K with 
a = 0.1. The ratio K/K also provides a measure of sparsity. 

Figure 2 illustrates this result on simulated networks. The solid (red) line is the theoretical 
accuracy computed for K = 30 and a = 1, the relevant parameters used to to simulate the sample 
of 100 weighted networks. The (blue) boxplots summarize the empirical accuracy at a number of 
distinct values of K/K. 



4 Results 

We evaluate graphlets on real and simulated data. Simulation results on weighted networks in 
Section 4.1 show that the estimation problem is well-posed and both the binary matrix B and the 
coefficients fi are estimable without bias, while results on binary networks in Section 4.2 enable 
the analysis of the comparative performance with existing methods for binary networks. We also 
develop an analysis of messaging patterns on Facebook for a number of US colleges and an analysis 
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of historical crime data in Sections 4.3-4.4. 

Overall, these results suggest that graphlets encode a new quantification of social information; 
we provide a concrete illustration of this idea in Section 5.1. 



4.1 Parameter estimation 

Here, we evaluate whether the parameters underlying graphlets — the entries of binary matrix B, 
its size K, and the non-negative coefficients fi — are estimable from 1 weighted network sample Y. 

We simulated M = 100 networks from the model, each with 50 nodes, using random values for 
the parameters /i, B and K. For the i-th network, the number of basis elements Ki was sampled 
from a Poisson distribution with rate A = 30. The coefficients fiji were sampled from a Gamma 
distribution with parameters a = 1 and (3 = 10. The entries of the basis matrix B were sampled 
from a Bernoulli distribution with probability of success p = 0.04. The index i = 1 ... A/ runs 
over the networks and the index j = 1 . . . Ki runs over the basis elements for the z-th network. 

Algorithms 1-2 were used to estimate the parameters B, fi and K, for each synthetic network. 
The estimation error oi K h \K — K\. Since the estimated matrix B is unique only up to a 
permutation of its columns, the Procrustes transform (Kendall, 1989) was used to identify the 
optimal rotation of the matrix B to be compared to the true matrix B. The estimation error of B is 
computed as the normalized L2 distance between B and B after Procrustes. The estimation error of 
fi is computed as the normalized L2 distance between /i and fi. The average error in reconstructing 
the simulated networks is computed as the Li distance between Y and Y normalized to the total 
number of counts in the network, averaged over 100 simulations, denoted |F|i. This metric gives 
more weight to larger cliques, since the clique size enters as a quadratic term. We also provide 
the average error in reconstructing the simulated networks based on the r-norm, which is less 
sensitive to incorrectly estimating larger cliques. In addition, we compute the average error in 
reconstructing whether or not edges have positive weights, rather than their magnitude, denoted 
error in 1(F > 0). 

Table 1 summarizes the simulation results for weighted network reconstructions obtained by 
setting a target accuracy tq, ranging from 0.20 to 1.00 — no error. The last row of Table 1 supports 
the claim that the estimation problem is well-posed; the estimation algorithm stably recovers the 
true parameter values. The only sources of non-zero error are the estimation of the number of 
basis elements and the estimation of the basis elements themselves, as encoded by the matrix B. 
However, these errors are negligible and do not seem to impact the estimation of the coefficients /i, 
nor to propagate to the network reconstruction independently of the metric we use to quantify it. 
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Table 1: Performance of the estimation algorithm for a target accuracy tq, on synthetic data. 



Y\i error 


t{Y) error 


error in l(r > 0) 


error in B 


error in /i 


k/k 


-To 


0.97 ±0.012 


0.65 ± 0.074 


0.76 ±0.062 


50.75 ± 10.893 


0.55 ±0.082 


0.08 ±0.035 


0.20 


0.92 ±0.016 


0.46 ± 0.026 


0.61 ±0.053 


42.40 ± 8.134 


0.32 ±0.041 


0.16 ±0.039 


0.50 


0.79 ±0.036 


0.23 ±0.012 


0.40 ±0.057 


29.14 ± 5.536 


0.10 ±0.015 


0.33 ±0.050 


0.75 


0.66 ±0.054 


0.14 ±0.008 


0.30 ±0.053 


21.72 ± 4.441 


0.04 ± 0.009 


0.45 ±0.060 


0.85 


0.56 ±0.063 


0.09 ±0.006 


0.24 ±0.048 


17.04 ± 3.757 


0.02 ± 0.004 


0.54 ±0.062 


0.90 


0.41 ±0.064 


0.04 ± 0.004 


0.17 ±0.041 


11.36 ± 2.980 


0.00 ±0.002 


0.67 ±0.061 


0.95 


0.19 ±0.051 


0.01 ±0.002 


0.07 ±0.024 


4.46 ± 1.646 


0.00 ±0.000 


0.85 ± 0.043 


0.99 


0.00 ±0.000 


0.00 ±0.000 


0.00 ±0.000 


0.67 ± 0.618 


0.00 ±0.000 


1.01 ±0.033 


1.00 



The average bias in estimating the optimal number of basis elements K is 0.07, with a standard 
deviation of 0.256. Column k/k provides the fraction of the estimated optimal number of basis 
elements k that is necessary to achieve the desired target accuracy, in terms of the r-norm, using 
Theorem 4. Column two provides the empirical t(Y) error, defined as one minus the accuracy, 
corresponding to the reported fraction of basis elements. The empirical accuracy matches the 
expected accuracy given by Theorem 4 well. For instance, the Theorem 4 implies that we need 
45% of the basis elements in order to achieve an accuracy of 0.85, and the empirical accuracy is 
slightly in excess of 0.85, on average. Figure 2 shows similar results, for a larger set of 29 values 
for the target accuracy tq. 

The first two columns of Table 1 report the average Li error and the average r-based error, 
properly normalized to have range in zero to one. Cliques in the simulated networks range between 
2 and 5 nodes, for most networks. The differences in the reconstruction error are due to the weight 
of these larger cliques. For instance, the empirical Li accuracy obtained by using 45% of the basis 
elements is around 0.65, as opposed to a r accuracy of around 0.85, on average. 

The accuracy in reconstructing the presence or absence of edges in the network, rather than the 
magnitude of the corresponding weights, is in between the Li accuracy and the r accuracy, for any 
fraction k/k. 

4.2 Link reconstruction 

Graphlet is a method for decomposing a weighted network. However, here we evaluate how 
graphlet fares in terms of the computational complexity versus accuracy trade-off when compared 
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Table 2: Link reconstruction accuracy and runtime for a number of competing methods, on simu- 
lated data. The graphlet decomposition is denoted Glet, with the fraction of basis elements used 
quoted in brackets. 



Method 


Runtime (sec) 


Accuracy 


Glet (25%) 


0.0636 


92.7 ±0.30 


Glet (50%) 


0.0636 


94.7 ±0.15 


Glet (75%) 


0.0636 


97.0 ±0.08 


Glet (90%) 


0.0636 


98.9 ±0.05 


Glet (100%) 


0.0636 


100.0 ±0.00 


ERG 


0.0127 


86.0 ±1.20 


DDS 


11.1156 


89.0 ±0.85 


LSCM 


6.3491 


90.0 ±0.70 


MMSB 


2.8555 


93.0 ±0.40 



to the performance current methods for binary networks can achieve. 

For this purpose, we generated 100 synthetic networks from the model, each with 100 nodes, 
choosing random values for the parameters as we did in Section 4.1. The number of basis was 
sampled from a Poisson with rate A = 12. The coefficients were sampled from a Gamma with 
parameters a = 2 and (3 = 10. The entries of the basis matrix were sampled from a Bernoulli with 
probability of success p = 0.1. We then obtained the corresponding binary networks by resetting 
positive edge weights to one, 1{Y > 0). 

We compute comparative performance for a few popular models of networks that can be used 
for link prediction. These include the Erdos-Renyi-Gilbert random graph model (Erdos and Renyi, 
1959; Gilbert, 1959), denoted ERG, a variant of the degree distribution model fitted with im- 
portance sampling (Blitzstein and Diaconis, 2010), denoted DDS, the latent space cluster model 
(Handcock et al., 2007), denoted LSCM, and the stochastic blockmodel with mixed membership 
(Airoldi et al., 2008), denoted MMSB. A comparison with singular value decomposition in terms 
binary link prediction is problematic, since using a few singular vectors would not lead to binary 
edge weights. 

Table 2 summarizes the comparative performance results. The accuracy of graphlet is reported 
for different fractions of the estimated optimal number of basis elements K, ranging from 25% 
to 100% — no error. The accuracy is consistently high, while the running time is a only a fraction 
of a second, independent of the desired reconstruction accuracy. This is because the complexity 
of computing the entire graphlet decomposition is linear in the number of edges present in the 
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network. Thus specifying a smaller fraction of the number of basis elements is a choice based on 
storage considerations, rather than on runtime. The comparative performance results suggest that 
graphlet decomposition is very competitive in predicting links. 

This binary link prediction simulation study is one way to compare graphlet to interesting ex- 
isting methods, which have been developed for binary graphs. 

4.3 Analysis of messaging on Facebook 

Here we illustrate graphlet with an application to messaging patterns on Facebook. We analyzed 
the number of public wall-post on Facebook, over a three month period, among students of a 
number of US colleges. While our data is new, the US colleges we selected have been previously 
analyzed (Traud et al., 2011). 

Table 3 provides a summary of the weighted network data and of the results of the graphlet 
decomposition. Salient statistics for each college include the number of nodes and edges. The table 
reports the number of estimated basis elements K for each network and the runtime, in seconds. 
The t{Y) error incurred by using a graphlet decomposition is reported for different fractions of the 
estimated optimal number of basis elements K, ranging from 10% to 100% — no error. 

Overall, these results suggest that the compression of wall-posts achievable on collegiate net- 
work is substantial. A graphlet decomposition with about 10% of the optimal number of basis 
elements already leads to a reconstruction error of 10% or less, with a few exceptions. Using 25% 
of the basis elements further reduces the reconstruction error to below 5%. 



4.4 Analysis of historical crime data 

More recently, we have used the graphlet decomposition to analyze crime associations records 
from the 19th century. These records are available at the South Carolina Department of Archives 
and History (Columbia, SC), Record Group 44, Series L 44158, South Carolina, Court of Gen- 
eral Sessions (Union County) Indictments 1800-1913. Nodes in the network are 6221 residents of 
Union County, during the years 1850 to 1880. Edge weights indicate the number of crimes involv- 
ing pairs of residents. Details on the structure of this data set and its historical relevance may be 
found in Frantz-Parsons (2005, 201 1). 

A first step in the analysis was to explore the degree to which there is an optimal level of granu- 
larity, for the observed associations, at which a non-trivial criminal structure emerges. We pursued 
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Table 3: t{Y) error for different fractions of the estimated optimal number of basis elements K. 



college 


nodes 


edges 


K 


sec 


10% 


25% 


50% 


75% 


90% 


100% 


American 


6386 


435323 


11426 


151 


13.5 


6.5 


1.80 


.70 


.30 





Amherst 


2235 


181907 


10151 


124 


6.5 


2.3 


.84 


.33 


.14 





Bowdoin 


2252 


168773 


9299 


113 


9.0 


3.2 


1.17 


.41 


.17 





Brandeis 


3898 


275133 


10340 


116 


6.8 


2.9 


1.21 


.53 


.25 





Bucknell 


3826 


317727 


13397 


193 


7.8 


2.9 


1.15 


.45 


.20 





Caltech 


769 


33311 


3735 


51 


5.7 


1.8 


.65 


.27 


.11 





CMU 


6637 


499933 


11828 


169 


14.9 


5.2 


1.89 


.73 


.34 





Colgate 


3482 


310085 


12564 


151 


8.0 


3.3 


1.26 


.45 


.19 





Hamilton 


2314 


192787 


11666 


200 


6.9 


2.5 


.84 


.33 


.15 





Haverford76 


1446 


119177 


9021 


128 


4.8 


2.2 


.74 


.25 


.10 





Howard 


4047 


409699 


12773 


170 


8.6 


3.7 


1.55 


.60 


.28 





Johns Hopkins 


5180 


373171 


11674 


150 


10.8 


3.7 


1.40 


.58 


.29 





Lehigh 


5075 


396693 


14076 


206 


9.5 


3.2 


1.14 


.49 


.23 





Michigan 


3748 


163805 


5561 


54 


11.4 


4.6 


1.92 


.76 


.35 





Middlebury 


3075 


249219 


9971 


109 


9.7 


3.5 


1.35 


.49 


.22 





MIT 


6440 


502503 


13145 


191 


11.5 


4.7 


1.68 


.65 


.30 





Oberlin 


2920 


179823 


7862 


84 


9.8 


3.8 


1.48 


.57 


.26 





Reed 


962 


37623 


3911 


46 


6.0 


2.3 


.99 


.40 


.17 





Rice 


4087 


369655 


12848 


155 


8.7 


3.2 


1.25 


.51 


.22 





Rochester 


4563 


322807 


10824 


124 


11.0 


3.7 


1.43 


.56 


.26 





Santa 


3578 


303493 


11203 


127 


10.3 


3.5 


1.30 


.51 


.24 





Simmons 


1518 


65975 


5517 


60 


6.4 


2.5 


1.04 


.44 


.19 





Smith 


2970 


194265 


8591 


102 


5.5 


2.5 


1.15 


.49 


.23 





Swarthmore 


1659 


122099 


7856 


96 


6.3 


2.6 


1.06 


.44 


.20 





Trinty 


2613 


223991 


10832 


131 


8.6 


3.0 


1.07 


.40 


.18 





Tufts 


6682 


499455 


14641 


212 


13.9 


4.8 


1.68 


.65 


.30 





UC Berkeley 


6833 


310663 


7715 


105 


16.3 


6.2 


2.28 


.90 


.42 





U Chicago 


6591 


416205 


12326 


176 


14.2 


4.7 


1.74 


.66 


.31 





Vassar 


3068 


238321 


11344 


134 


9.3 


3.2 


1.18 


.47 


.21 





Vermont 


7324 


382441 


10030 


145 


17.5 


5.4 


2.05 


.82 


.36 





USFCA 


2682 


130503 


6735 


67 


10.5 


3.8 


1.50 


.60 


.26 





Wake Forest 


5372 


558381 


15580 


211 


11.5 


4.2 


1.46 


.58 


.25 





Wellesley 


2970 


189797 


9768 


107 


8.9 


3.3 


1.24 


.48 


.22 





Wesleyan 


3593 


276069 


10506 


118 


9.9 


3.6 


1.40 


.54 


.25 
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Figure 3: A two-step analysis of the Union County criminal association network using graphlets 
reveals tight pockets of criminality (the numbered nodes) and local criminal communities (the 
lettered nodes). 
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this analysis by fitting graphlets to the raw criminal association network raised to different powers, 
Y'' for k = 1 ... 5. We found that F^ contained a substantial degree of criminal social structure. 
At this resolution, k = 2, two residents are associated with crimes in which they are either directly 
involved, or in which they are involved indirectly through their direct criminal associates. The 
results suggest that there are several tight pockets of criminality in Union County. 

We were interested in developing an interpretation for these pockets of criminality. One com- 
pelling hypothesis is that each pocket captures a gang or a clan. If this holds, we might expect the 
variation in the gangs' relations to be explained by the spatial organization of the townships they 
operate in, to a large degree. 

To test this hypothesis, we built a meta network of criminal associations, in which each gang 
identified in the first step of the analysis is represented by a meta node. Two or more gangs were 
connected in the meta network whenever a criminal record involved at least one of their members. 
A graphlet decomposition of the meta network identified pockets criminal activity at the gang 
level. Plotting these gang level criminal associations reveals the townships' spatial organization, to 
a large degree. Figure 3 shows the pockets of criminality identified by the graphlet analysis of the 
original criminal associations, Y'^, as green nodes associated with numbers, as well as the pockets 
of criminality identified by the graphlet analysis of the meta network, as purple nodes associated 
with letters. 

In summary, we were abel to successfully use graphlets as an exploratory tool to capture and 
distill an interesting, multi-scale organization among criminals in 19th century Union County, from 
historical records. 



5 Discussion 

Taken together, our results suggest that the graphlet decomposition implies a new notion of so- 
cial information, quantified in terms of multi-scale social structure. We explored this idea with a 
simulation study. 

5.1 A new notion of "social information" 

Graphlet quantifies social information in terms of community structure (i.e., maximal cliques) 
at multiple scales and possibly overlapping. To illustrate this notion of social information, we 
simulated 200 networks: 100 Erdos-Renyi-Gilbert random graphs with Poisson weights and 100 
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networks from the model described in Section 2. While, arguably, the Poisson random graphs do 
not contain any such information, the data generating process underlying graphlets was devised to 
translate such information into edge weights. 

As a baseline for comparison, we consider the singular value decomposition (Searle, 2006), 
applied to the symmetric adjacency matrices with integer entries encoding the weighted networks. 
The SVD summarizes edge weights using orthogonal eigenvectors Vi as basis elements and the 
associated eigenvalues Aj as weights, Y^xn = J2i=i \viv[. However, SVD basis elements are 
not interpretable in terms of community structure, thus SVD should not be able to capture the 
notion of social information we are interested in quantifying. 

We applied graphlets and SVD to the two sets of networks we simulated. Figure 4 provides an 
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Figure 4: Comparison between graphlet and SVD decompositions. Top panels report results for 
on the networks simulated from the model underlying graphlets. Bottom panels report results on 
the Poisson random graphs. 
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overview of the results. Panels in the top row report results for on the networks simulated from the 
model underlying graphlets. Panels in the bottom row report results on the Poisson random graphs. 
In each row, the left panel shows the box plots of the coefficients associated with each of the basis 
elements, for graphlets in blue and for SVD in red. The right panel shows the box plots of the 
cumulative representation error as a function of the number of basis elements utilized. Graphlets 
coefficients decay more slowly than SVD coefficients on Poisson graphs (bottom left). Because 
of this, the error in reconstructing Poisson graphs achievable with graphlets is consistently worse 
then the error achievable with SVD (bottom right). In contrast, graphlets coefficients decay rapidly 
to zero on networks with social structure, much sharply then the SVD coefficients (top left). Thus, 
the reconstruction error achievable with graphlets is consistently better then the error achievable 
with SVD (top right). 

These results support our claim that graphlets is able to distill and quantify a notion of social 
information in terms of social structure. 



5.2 Concluding remarks 

The graphlet decomposition of a weighted network is an exploratory tool, which is based on a 
simple but compelling statistical model, it is amenable to theoretical analysis, and it scales to large 
weighted networks. 

SVD is the most parsimonious orthogonal decomposition of a data matrix Y (Hastie et al., 
2001). Graphlets abandon the orthogonality constraint to attain interpretability of the basis matrix 
in terms of social information. In the presence of social structure, graphlet is a better summary 
of the variability in the observed connectivity. SVD always achieves a lower reconstruction error, 
however, when a few basis elements are considered (see Figure 4) suggesting that orthogonality 
induces a more useful set of constraints whenever a very low dimensional representation is of 
interest. 

Graphlets provide a parsimonious sumary of complex social structure, in practice. The sim- 
ulation studies in Sections 4.1 and 5.1 suggest that graphlets are leveraging the social structure 
underlying messaging patterns on Facebook to deliver the high compression/high accuracy ratios 
we empirically observe in Table 3. 

The information graphlets utilize for inference comes exclusively from positive weights in the 
network. This feature is the source of desirable properties, however, because of this, graphlets 
are also sensitive to missing data; zero edge weights cannot be imputed. More research is needed 
to address this issue properly, while maintaining the properties outlined in Section 3. Empirical 
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solutions include considering powers of the original network, as in Section 4.4, or cumulating 
weights over a long observation window, as in Section 4.3. Intuitively, cumulating messages over 
long time windows will reveal a large set of basis elements. Algorithm 2 can then be used to project 
message counts over a short observation window onto the larger set of basis elements to estimate 
which elements were active in the short window and how much. Recent empirical studies suggest 
three months as a useful cumulation period (Cortes et al., 2001; Kossinets and Watts, 2006). 
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A Proofs of theorems and lemmas 

In the following, A o C denotes the element- wise (Hadamard) product of two matrices A and C. 

A.l Proof of Theorem 1 

Proof. We will show B can be recovered from A using the deterministic algorithm 3. 

In each step, due to non-expandability, the maximal clique of A', C, will not be a subset of any 
other clique and will contain an edge which only belongs to C, using Lemma 1 below. The weight 
of this edge in A' will be fic the coefficient of clique C in the composition. Also, edges of clique 
C in the network will not have weights less than /Xc because all the coefficients in composition(/is) 
are positive. So, in each step, the weight corresponding to the edge(s) unique to C will be correctly 
detected and subtracted from the network. 
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Set A' = A 
While(There is at least one non-zero element in A') 
He = mmij{AoC){iJ) 
ij=aTgmm,j{A o C){i,j) 
C=largest clique in A' that includes the edge ij^ 
A'=A' - /i,C 

Add (C, fic) to (B, W) respectively. 
Algorithm 3: Algorithm in theorem 1 for uniqueness of non expandable basis 

Hence, using algorithm 3 we can find basis B and coefficients W of matrix in A' = BWB^. 
Since the algorithm is deterministic given A, it will provide us only one set of unique NEB. D 

Lemma l.IfB generates a non-expandable basis set S, any clique C E S will contain an edge 
that does not belong to any other clique in S. 

Proof. Proof by contradiction: If every edge in C overlaps with another clique in S, then C can be 
decomposed to other cliques, which contradicts non-expandability of S. D 



A.2 Proof of Theorem 2 

Proof. Recall that Pk = b,kb\ and A = J2i l^iPi- For any k G {1, 2, .., -ft'}, setting a threshold on 
the edge weights of the network A, such that edges with weights above the threshold are selected, 
obtains a binary network. Define Gk to be those indices £ for which B^ covers B^- If the above 
mentioned threshold is set to thk = J2i(^g /^'' ^^^^ ^ binary network Ath=thk will be revealed. 
Below we will prove that P^ is a maximal clique for Ath=thk using Lemma 2. 

To show that P^ is a maximal cliques of Ath=thk we need to prove the following two statements: 

(1) Pk ^ Ath=thk which means clique Pk should be included in Ath=thh- This statement is true 
because the weights of the edges in A that correspond to P^, are greater than thk since /xs are 
positive. 

(2) Pk can not be part of a larger clique in Ath=thk ■ By contradiction, assume such a larger clique 
exists and name it P' then Pk C P' C Ath=thk ■ Considering that the basis is non-expandable, given 
lemma (2) and P' C Ath=thk ^ VieT\G ^h ^^^^ ^^ '-^^ ^^'^ 3 ^ T\G that P' C Pj. Hence, we 
have Pk C P' (^ Pj which leads to Pk C Pj for a j E T\G. This is clearly a contradiction since 
such j's are excluded in G. 

With Pfc's being one of the maximal cliques of thresholded binary networks at thk and con- 
sidering the fact that the threshold level will be equal to J^tei A** ^°^ ^^y ^ ^ {Gi,G2, ■■■, Gk} 
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(because miiijj Ajj < Xlze/ f^i ^ maxy Ajj ), it is clear that B C Be. This ends the proof of 
theorem (2). D 

Lemma 2. Ath=thk ^ Viervc^ Pi, for T = I... K. 

Proof. We will prove this by contradiction. Assume the above claim is not true. In that case, 
an edge in Ath=th^. can be found which belongs to complementary set: Viec -^A VierVG -^»- 
However, the weight for this edge is more than thk = J2ieG 1^^ which is not possible for edges 
included in VieG -^A yieT\G ^i because the maximum weight for edges obtained from this group 
of chques is at most thk = ZljeG ^^l■ ^^^ ^^ proof of the lemma (2). D 



A.3 Proof of Theorem 3 

Proof Using the definition of Graphlet, Pi, ..,Pk are cliques generated from matrix B. Con- 
sidering that Y = Yliil^iPi^ ^^y clique P' detected at different thresholds will be in the form of 
P' = /\i^j Pi for an / C {1, 2, .., K}. Each clique will be detected at most Q times. Then the 
total number of detected cliques will not be more than Q times 2^, the total number of possible 
Is. However, based on the distribution of elements of matrix B we know the typical set for Is 
have size of 2^^'^'p^\ asymptotically for large n, where H{.) is entropy function. This follows 
the fact that the bit strings for nodes will be in a typical set where each bit-string has Kp^ bits of 
one. Hence, the number of nonempty common intersections among a nontypical set of cliques will 
converge to zero and we will have Cn,p = Q2^^'^'p^'> + QK. D 
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